Consumer Complaint Database

An overview of consumer complaint behavior and company responses

By Luis Alarcon

Disclaimer

I endeavour in this project out of my own volition. It is the first time that I come in contact with this data. I carry no bias before carrying this Exploratory Data Analysis, and the primary goal is to further my knowledge in the methods as well as an understanding of different areas through data analysis.

Introduction

Dataset General Information

The full name of the data set is “Inpatient Prospective Payment System (IPPS) Provider Summary for the Top 100 Diagnosis-Related Groups (DRG) - FY2011”. It is own and updated by CMS.

The last update was on May 29, 2014, and the metadata was updated on August 30, 2017. For access to the data set as well as further information, please go to this link.

Dataset Contents

The Dataset has 12 columns. These are the names and the description of them:

This information it is copied straight from the pdf attachment found in this link.
  1. DRG Definition
  2. The code and description identifying the MS-DRG. MS-DRGs are a 
    classification system that groups similar clinical conditions (diagnoses) 
    and the procedures furnished by the hospital during the stay.
  3. Provider Id
  4. The CMS Certification Number (CCN) assigned to the Medicare certified 
    hospital facility.
  5. Provider Name
  6. The name of the provider.
  7. Provider Street Address
  8. The provider’s street address.
  9. Provider City
  10. The city where the provider is located.
  11. Provider State
  12. The state where the provider is located.
  13. Provider Zip Code
  14. The provider’s zip code.
  15. Provider HRR
  16. The Hospital Referral Region (HRR) where the provider is located.
  17. Total Discharges
  18. The number of discharges billed by the provider for inpatient 
    hospital services.
  19. Average Covered Charges
  20. The provider's average charge for services covered by Medicare 
    for all discharges in the MS-DRG. These will vary from hospital to 
    hospital because of differences in hospital charge structures.
  21. Average Total Payments
  22. The average total payments to all providers for the MS-DRG including 
    the MS-DRG amount, teaching, disproportionate share, capital, and outlier
    payments for all cases. Also included in average total payments are 
    co-payment and deductible amounts that the patient is responsible for and 
    any additional payments by third parties for coordination of benefits.
  23. Average Medicare Payments
  24. The average amount that Medicare pays to the provider for Medicare's share 
    of the MS-DRG. Average Medicare payment amounts include the MS-DRG amount, 
    teaching, disproportionate share, capital, and outlier payments for all 
    cases. Medicare payments DO NOT include beneficiary co-payments and 
    deductible amounts norany additional payments from third parties for 
    coordination of benefits. Note: In general, Medicare FFS claims with 
    dates-of-service or dates-of-discharge on or after April 1, 2013, incurred 
    a 2 percent reduction in Medicare payment. This is in response to mandatory 
    across-the-board reductions in Federal spending, also known as 
    sequestration. For additional information, <a href= "http://www.cms.gov/Outreach-and-Education/Outreach/FFSProvPartProg/Downloads/
    2013-03-08-standalone.pdf">visit</a>. 

The initial process of refining the dataset was done in a python script named “u_project_EDA-CMS_python3.ipynb”, the code is presented in “u_project_EDA-CMS_python3.html”; for further details examine the HTML file.

Exploratory Data Analysis (EDA)

EDA: Initial Data Exploration

Data Summary

Firstly, we load the packages that we are going to use in this project.

Secondly, we load the dataset from its original location to the data frame df_cms.

##                             DRG.Definition Provider.Id
## 1 039 - EXTRACRANIAL PROCEDURES W/O CC/MCC       10001
## 2 039 - EXTRACRANIAL PROCEDURES W/O CC/MCC       10005
## 3 039 - EXTRACRANIAL PROCEDURES W/O CC/MCC       10006
## 4 039 - EXTRACRANIAL PROCEDURES W/O CC/MCC       10011
## 5 039 - EXTRACRANIAL PROCEDURES W/O CC/MCC       10016
## 6 039 - EXTRACRANIAL PROCEDURES W/O CC/MCC       10023
##                      Provider.Name    Provider.Street.Address
## 1 SOUTHEAST ALABAMA MEDICAL CENTER     1108 ROSS CLARK CIRCLE
## 2    MARSHALL MEDICAL CENTER SOUTH 2505 U S HIGHWAY 431 NORTH
## 3   ELIZA COFFEE MEMORIAL HOSPITAL         205 MARENGO STREET
## 4                ST VINCENT'S EAST 50 MEDICAL PARK EAST DRIVE
## 5    SHELBY BAPTIST MEDICAL CENTER    1000 FIRST STREET NORTH
## 6     BAPTIST MEDICAL CENTER SOUTH  2105 EAST SOUTH BOULEVARD
##   Provider.City Provider.State Provider.Zip.Code
## 1        DOTHAN             AL             36301
## 2          BOAZ             AL             35957
## 3      FLORENCE             AL             35631
## 4    BIRMINGHAM             AL             35235
## 5     ALABASTER             AL             35007
## 6    MONTGOMERY             AL             36116
##   Hospital.Referral.Region.Description Total.Discharges
## 1                          AL - Dothan               91
## 2                      AL - Birmingham               14
## 3                      AL - Birmingham               24
## 4                      AL - Birmingham               25
## 5                      AL - Birmingham               18
## 6                      AL - Montgomery               67
##   Average.Covered.Charges Average.Total.Payments Average.Medicare.Payments
## 1                32963.07                5777.24                   4763.73
## 2                15131.85                5787.57                   4976.71
## 3                37560.37                5434.95                   4453.79
## 4                13998.28                5417.56                   4129.16
## 5                31633.27                5658.33                   4851.44
## 6                16920.79                6653.80                   5374.14
##   Census.Region Census.Region.Division Federal.Region
## 1         SOUTH  D6_EAST_SOUTH_CENTRAL      REGION_IV
## 2         SOUTH  D6_EAST_SOUTH_CENTRAL      REGION_IV
## 3         SOUTH  D6_EAST_SOUTH_CENTRAL      REGION_IV
## 4         SOUTH  D6_EAST_SOUTH_CENTRAL      REGION_IV
## 5         SOUTH  D6_EAST_SOUTH_CENTRAL      REGION_IV
## 6         SOUTH  D6_EAST_SOUTH_CENTRAL      REGION_IV
##   Economic.Analysis.Region
## 1                SOUTHEAST
## 2                SOUTHEAST
## 3                SOUTHEAST
## 4                SOUTHEAST
## 5                SOUTHEAST
## 6                SOUTHEAST

Before we continue, we check if there are any null values within the data frame df_cms.

##  [1] DRG.Definition                      
##  [2] Provider.Id                         
##  [3] Provider.Name                       
##  [4] Provider.Street.Address             
##  [5] Provider.City                       
##  [6] Provider.State                      
##  [7] Provider.Zip.Code                   
##  [8] Hospital.Referral.Region.Description
##  [9] Total.Discharges                    
## [10] Average.Covered.Charges             
## [11] Average.Total.Payments              
## [12] Average.Medicare.Payments           
## [13] Census.Region                       
## [14] Census.Region.Division              
## [15] Federal.Region                      
## [16] Economic.Analysis.Region            
## <0 rows> (or 0-length row.names)

There are no null values in the data frame. Afterwards, we check the number of entries present in this dataset.

## [1] 163065     16

The number of entries is 163065. The number of entries is significant. Therefore, We proceed to make a quick summary of the data.

##                                                                   DRG.Definition  
##  194 - SIMPLE PNEUMONIA & PLEURISY W CC                                  :  3023  
##  690 - KIDNEY & URINARY TRACT INFECTIONS W/O MCC                         :  2989  
##  292 - HEART FAILURE & SHOCK W CC                                        :  2953  
##  392 - ESOPHAGITIS, GASTROENT & MISC DIGEST DISORDERS W/O MCC            :  2950  
##  641 - MISC DISORDERS OF NUTRITION,METABOLISM,FLUIDS/ELECTROLYTES W/O MCC:  2899  
##  871 - SEPTICEMIA OR SEVERE SEPSIS W/O MV 96+ HOURS W MCC                :  2812  
##  (Other)                                                                 :145439  
##   Provider.Id                       Provider.Name   
##  Min.   : 10001   GOOD SAMARITAN HOSPITAL  :   633  
##  1st Qu.:110092   ST JOSEPH MEDICAL CENTER :   427  
##  Median :250007   MERCY MEDICAL CENTER     :   357  
##  Mean   :255570   MERCY HOSPITAL           :   347  
##  3rd Qu.:380075   ST JOSEPH HOSPITAL       :   343  
##  Max.   :670077   ST FRANCIS MEDICAL CENTER:   277  
##                   (Other)                  :160681  
##              Provider.Street.Address      Provider.City   
##  100 MEDICAL CENTER DRIVE:   183     CHICAGO     :  1505  
##  800 WASHINGTON STREET   :   166     BALTIMORE   :  1059  
##  1 MEDICAL CENTER DRIVE  :   142     HOUSTON     :   950  
##  100 HOSPITAL DRIVE      :   106     PHILADELPHIA:   898  
##  100 MICHIGAN ST NE      :   100     BROOKLYN    :   877  
##  1000 BLYTHE BLVD        :   100     SPRINGFIELD :   807  
##  (Other)                 :162268     (Other)     :156969  
##  Provider.State   Provider.Zip.Code Hospital.Referral.Region.Description
##  CA     : 13064   Min.   : 1040     CA - Los Angeles :  3653            
##  TX     : 11864   1st Qu.:27261     MA - Boston      :  2910            
##  FL     : 11155   Median :44309     GA - Atlanta     :  2630            
##  NY     :  9178   Mean   :47938     TX - Houston     :  2577            
##  IL     :  7909   3rd Qu.:72901     PA - Philadelphia:  2554            
##  PA     :  7804   Max.   :99835     TX - Dallas      :  2427            
##  (Other):102091                     (Other)          :146314            
##  Total.Discharges  Average.Covered.Charges Average.Total.Payments
##  Min.   :  11.00   Min.   :  2459          Min.   :  2673        
##  1st Qu.:  17.00   1st Qu.: 15947          1st Qu.:  5234        
##  Median :  27.00   Median : 25246          Median :  7214        
##  Mean   :  42.78   Mean   : 36134          Mean   :  9707        
##  3rd Qu.:  49.00   3rd Qu.: 43233          3rd Qu.: 11286        
##  Max.   :3383.00   Max.   :929119          Max.   :156158        
##                                                                  
##  Average.Medicare.Payments   Census.Region  
##  Min.   :  1149            MIDWEST  :39087  
##  1st Qu.:  4192            NORTHEAST:30230  
##  Median :  6158            SOUTH    :67038  
##  Mean   :  8494            WEST     :26710  
##  3rd Qu.: 10057                             
##  Max.   :154621                             
##                                             
##            Census.Region.Division    Federal.Region 
##  D5_SOUTH_ATLANTIC    :34118      REGION_IV :37453  
##  D3_EAST_NORTH_CENTRAL:27434      REGION_V  :29686  
##  D2_MID-ATLANTIC      :21808      REGION_VI :20395  
##  D7_WEST_SOUTH_CENTRAL:19478      REGION_III:17911  
##  D9_PACIFIC           :17814      REGION_IX :17559  
##  D6_EAST_SOUTH_CENTRAL:13442      REGION_II :14004  
##  (Other)              :28971      (Other)   :26057  
##  Economic.Analysis.Region
##  SOUTHEAST  :48468       
##  GREAT LAKES:27434       
##  MIDEAST    :25994       
##  FAR WEST   :19016       
##  SOUTHWEST  :18152       
##  PLAINS     :11653       
##  (Other)    :12348

Due to the vast array of data and groups found in this particular data frame. We will begin our analysis per regions. We will start with “Census Region” since it encompasses all states within the four central regions of the USA, the divisions within regions, the Federal bank that is in charge of the state as well as the Regions established by the Bureau of Economic Analysis.

Univariate Plots Section

This graph contains the number of entry per each of the groups mentioned above.

Univariate Analysis Section

We will start the initial analysis by obtaining the frequency for each one of the variables of the columns:

  1. “Census.Region”

  2. “Census Region Division”

  3. “Federal Region”

  4. “Bureau of Economic Analysis Region”

##   Census.Region  freq
## 1       MIDWEST 39087
## 2     NORTHEAST 30230
## 3         SOUTH 67038
## 4          WEST 26710
##   Census.Region.Division  freq
## 1         D1_NEW_ENGLAND  8422
## 2        D2_MID-ATLANTIC 21808
## 3  D3_EAST_NORTH_CENTRAL 27434
## 4  D4_WEST_NORTH_CENTRAL 11653
## 5      D5_SOUTH_ATLANTIC 34118
## 6  D6_EAST_SOUTH_CENTRAL 13442
## 7  D7_WEST_SOUTH_CENTRAL 19478
## 8            D8_MOUNTAIN  8896
## 9             D9_PACIFIC 17814
##    Federal.Region  freq
## 1        REGION_I  8422
## 2       REGION_II 14004
## 3      REGION_III 17911
## 4       REGION_IV 37453
## 5       REGION_IX 17559
## 6        REGION_V 29686
## 7       REGION_VI 20395
## 8      REGION_VII  8493
## 9     REGION_VIII  4283
## 10       REGION_X  4859
##   Economic.Analysis.Region  freq
## 1                 FAR WEST 19016
## 2              GREAT LAKES 27434
## 3                  MIDEAST 25994
## 4              NEW ENGLAND  8422
## 5                   PLAINS 11653
## 6           ROCKY MOUNTAIN  3926
## 7                SOUTHEAST 48468
## 8                SOUTHWEST 18152

In the Census Region, the region South has the highest account of entries with 67038 and the lowest in the West with 26710. In the Divisions in the Census Region, the Division 5 South Atlantic has the hights entries with 34116, and the lowest in the Division 1 New England.

When considering Regions under the different Federal Banks, Region IV is the one with the highest entry and the region with the lowest entries is Region_XIII with 4283. Lastly, The Rocky Mountain Region of the Bureau of Economic Analysis is the region with the lowest entries in 3926. On the other hand, Southeast is the region with the highest entries with 48468.

Before we continue, we will get the summary of the Total Discharges, Average Covered Charges, Average Total Payments and Average Medicare Payments per Census Region.

## df_cms$Census.Region: MIDWEST
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   11.00   17.00   28.00   43.87   50.00 1487.00 
## -------------------------------------------------------- 
## df_cms$Census.Region: NORTHEAST
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   11.00   18.00   29.00   45.83   53.00 3383.00 
## -------------------------------------------------------- 
## df_cms$Census.Region: SOUTH
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   11.00   17.00   27.00   43.81   50.00 1344.00 
## -------------------------------------------------------- 
## df_cms$Census.Region: WEST
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   11.00   15.00   23.00   35.13   39.00 1696.00
## df_cms$Census.Region: MIDWEST
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3611   14792   21413   29395   34654  353774 
## -------------------------------------------------------- 
## df_cms$Census.Region: NORTHEAST
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3481   14964   25870   37123   45428  613927 
## -------------------------------------------------------- 
## df_cms$Census.Region: SOUTH
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2459   15416   23890   33361   40136  480540 
## -------------------------------------------------------- 
## df_cms$Census.Region: WEST
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4202   23310   37111   51836   61668  929119
## df_cms$Census.Region: MIDWEST
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2673    5062    6897    9228   10837  131187 
## -------------------------------------------------------- 
## df_cms$Census.Region: NORTHEAST
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2765    5610    7824   10515   12138  140255 
## -------------------------------------------------------- 
## df_cms$Census.Region: SOUTH
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2683    4878    6686    8941   10460   99307 
## -------------------------------------------------------- 
## df_cms$Census.Region: WEST
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2995    6134    8525   11419   13295  156158
## df_cms$Census.Region: MIDWEST
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1327    4016    5850    7991    9613  130467 
## -------------------------------------------------------- 
## df_cms$Census.Region: NORTHEAST
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1680    4597    6818    9366   11009  133177 
## -------------------------------------------------------- 
## df_cms$Census.Region: SOUTH
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1149    3849    5640    7746    9303   95701 
## -------------------------------------------------------- 
## df_cms$Census.Region: WEST
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1604    5034    7429   10125   11900  154621

EDA: Furthe Data Exploration

Bivariate Plots Section

Bivariate Plots Section: Plots

In order to continue this analysis, we will proceed to see the relationship between Total Discharges, Averaged Covered Charges (USD), Average Total Payments (USD) and Average Medicare Payments (USD).

From the graph, we can see that the relationship between Total Discharges and Average Covered Charges is that the fewer discharges the more Average Covered Charges. This trend is the same when we compare Total Discharges is plotted alongside Average Total Payments and Average Medicare Payments.

When a correlation test is performed between Total Discharges and the other variables these are the results:

  • Total Discharges vs Average Covered Charges(USD):
  • Corr: -0.025
  • Total Discharges vs Average Total Payments(USD):
  • Corr: 0.0151
  • Total DIscharges vs Average Medicare Payments(USD):
  • Corr:0.0121

The results indicate that there is no correlation between the number of discharges and the payments made or received by patients or Medicare.

The graph also shows the relationship between Average Covered Charges(USD) and Average Total Payments(USD) and Average Medicare Payments(USD). The graph shows almost a more linear positive relationship between Average Covered Charges(USD) and Average Total Payments(USD) and Average Medicare Payments (USD).

The graph is supported by the data extracted in the correlation test between the variables:

  • Average Covered Charges(USD) vs Average Total Payments(USD):
  • Corr: 0.772
  • Average Covered Charges(USD) vs Average Medicare Payments(USD):
  • Corr: 0.764

    The results indicate that there is a positive relationship between the variable and due to the similarity in results it further supports the similarity between the two relationships.

    The last relationship that could be explored from the graph is between Average Total Payments(USD) and Average Medicare Payments(USD). The graph shows a very linear relationship between the two variables, and this is further supported by the correlation test showing a Corr value of 0.992.

    Multivariable Section

    Multivariable Section: Plots

    We will start by analysing the following group of variables together:

    1. Average Covered Charges vs Average Medicare Payments: This comparison was chosen because Average Covered Charges are the average charge for services covered by Medicare for all discharges in the-DRG and Average Medicare Payments are the average amount that Medicare pays providers for Medicare’s share of the MS-DRG.
    2. Total Discharges vs Average Total Payments: This comparison was chosen because the total discharges account for the number of discharges billed by the provider for inpatient hospital services and the average total payments account for the total the average total payments to all providers for the MS-DRG; including partial coverage by the patient.
    3. Average Total Payments vs Average Medicare Payments.
    Average Covered Charges vs Average Medicare Payments
    CC-vs-MP:Plots

    The first graph that is going to show the whole data from df_cms. The whole data is graphed in this way in order to acquire the whole pattern of the data between the two variables.

    As seen on the graph, the majority of the data is under the x-value 250000$ and y-value 50000$. Due to this the second graph will have the limits mentioned above and also 4 further graphs will be added; the entire data frame data selectively gather under the four census regions of the USA, Midwest, Northeast, South and West.

    As we can see from the graph West and Midwest regions, have a similar scatter plot, flatter and more left-leaning. However, Midwest has a more linear shape than the West region.

    Northeast and South regions have similar scatterplots with widening x and y profile than other regions. However, the South region have a straight line. It will be interesting to know why it was developed.

    CC-vs-MP:Analysis

    We start the analysis by determining the Correlation between the two variables. We will perform a Pearson test between the two variables without subdividing the data into regions.

    ## 
    ##  Pearson's product-moment correlation
    ## 
    ## data:  df_cms$Average.Covered.Charges and df_cms$Average.Medicare.Payments
    ## t = 485.66, df = 163060, p-value < 2.2e-16
    ## alternative hypothesis: true correlation is not equal to 0
    ## 95 percent confidence interval:
    ##  0.7669354 0.7709033
    ## sample estimates:
    ##       cor 
    ## 0.7689268

    The test shows a cor value of 0.767. Values between 0.7 and 0.9 state that there is a steady uphill linear relationship between Average Covered Charges and Average Medicare payments.

    For the moment we will only explore the correlation between the two variables, and in a subsequent analysis with more detail and in-depth data we will explore other analysis between these two variables

    Total Discharges vs Average Total Payments
    TD-vs-TP:Plots

    This first graph is going to deliver an overall picture of the relation between the two data columns; including data patterns.

    This graph shows that the majority of total discharges are fewer than 250 with an increase average total payments are mostly under 25000$.

    The next graph will plot the same graph as above but with limits of x-value 500 and y-value 50000$, as well as four separate graphs with the data frame gather selectively from the regions of the USA.

    All graphs share a similar pattern than the one shared from the pattern mention in the general and global graph. There are two distinct protrusions of data reflecting an increase of total discharges with approximately an average total payments of about 10000$; with a constant increase of discharges, similar to a horizontal line. The pattern is repeated about 15000$ Average Total Payments; it is less apparent in the West Region.

    TP-vs-MP:Analysis

    We start the analysis by determining the Correlation between the two variables. We will perform a Pearson test between the two variables without subdividing the data into regions.

    ## 
    ##  Pearson's product-moment correlation
    ## 
    ## data:  df_cms$Total.Discharges and df_cms$Average.Total.Payments
    ## t = -6.4699, df = 163060, p-value = 9.833e-11
    ## alternative hypothesis: true correlation is not equal to 0
    ## 95 percent confidence interval:
    ##  -0.02087213 -0.01116732
    ## sample estimates:
    ##        cor 
    ## -0.0160201

    The test shows a cor value of -0.016. Values between 0 and -0.3 state that there is a weak negative linear relationship between Total Discharges and Average Total Payments.

    It is unusual that this the correlation found between these variables; especially since these two variables involve the total of discharges and average total payments.

    For the moment we will only explore the correlation between the two variables, and in a subsequent analysis with more detail and in-depth data, we will explore other analysis between these two variables.

    Average Total Payments vs Average Medicare Payments
    TP-vs-MP:Plots

    This first graph is going to deliver an overall picture of the relation between the two data columns; including data patterns.

    This graph shows a linear-like pattern. The majority of the entries are within the liner pattern. Also, the majority of entries are within x-value and y-value limits of 50000$.

    All graphs show the same pattern. There are mild differences in the width. Further analysis could reveal more about this pattern.

    TP-vs-MP:Analysis

    We start the analysis by determining the Correlation between the two variables. We will perform a Pearson test between the two variables without subdividing the data into regions.

    ## 
    ##  Pearson's product-moment correlation
    ## 
    ## data:  df_cms$Average.Total.Payments and df_cms$Average.Medicare.Payments
    ## t = 2746.3, df = 163060, p-value < 2.2e-16
    ## alternative hypothesis: true correlation is not equal to 0
    ## 95 percent confidence interval:
    ##  0.9892588 0.9894642
    ## sample estimates:
    ##      cor 
    ## 0.989362

    The test shows a cor value of 0.989. Values between 0.7 and 0.9 state that there is a steady uphill linear relationship between Total Payments and Average Medicare Payments.

    For the moment we will only explore the correlation between the two variables, and in a subsequent analysis with more detail and in-depth data, we will explore other analysis between these two variables.

    In the region of the Northeast of the USA, we see that the Interquartile Range is between ~18 for the 25% percentile and ~67 for the 75% percentile.

    In the region of the Midwest of the USA, we see that the Interquartile Range is between ~18 for the 25% percentile and ~67 for the 75% percentile.

    In the region of the South of the USA, we see that the Interquartile Range is between ~18 for the 25% percentile and ~50 for the 75% percentile.

    In the region of the West of the USA, we see that the Interquartile Range is between ~15 for the 25% percentile and ~42 for the 75% percentile.

    Final Plots and Summary

    This section only handles aesthetic modification of these images. The three images chosen for this section are:

    1. Average Covered Charges vs Average Medicare Payments
    2. Average Total Payments vs Average Medicare Payments.
    3. Total Discharges by Census Region Division.

    Reflection

    The Dataset is one that shows a significant amount of data towards Medicare payment information in hospitals throughout the USA. It consists of 16 variables and 163065 entries. We explore the various relationships between total payments, Medicare payments, Total Discharges, and regions within the USA.

    The dataset is vast and much more analysis could be performed, for example, the differences in payment from Medicare and Total payments made per procedure per hospitals in the same region as well as through states or regions in the USA.

    The vastness of this dataset it is on itself a struggle, the analysis made in here it is mainly superficial and many more aspects could be analysed further. However, this analysis does give an overview picture of the distribution of payments and discharges stored and how that is represented in the different regions of the USA.

    Furthermore, another difficulty obtain was to allocate the data into regions, divisions and so on. No python package is readily available that provides the information required, therefore it needed to be done by hand and checked in a multitude of sources, so the information was accurate.

    One question that could further be asked form this dataset is how the different medical procedures are between payments(All the different types of payment related variables in this dataset) and discharges among states, regions and divisions. It is understood that different hospitals have a different payment structure or charge structure per procedure, so a further analysis could show this pattern and how much more significant such a difference is.